Enforcing Subcategorization Constraints in a Parser Using Sub-parses Recombining

نویسندگان

  • Seyed Abolghasem Mirroshandel
  • Alexis Nasr
  • Benoît Sagot
چکیده

Treebanks are not large enough to adequately model subcategorization frames of predicative lexemes, which is an important source of lexico-syntactic constraints for parsing. As a consequence, parsers trained on such treebanks usually make mistakes when selecting the arguments of predicative lexemes. In this paper, we propose an original way to correct subcategorization errors by combining subparses of a sentence S that appear in the list of the n-best parses of S. The subcategorization information comes from three different resources, the first one is extracted from a treebank, the second one is computed on a large corpora and the third one is an existing syntactic lexicon. Experiments on the French Treebank showed a 15.24% reduction of erroneous subcategorization frames (SF) selections for verbs as well as a relative decrease of the error rate of 4% Labeled Accuracy Score on the state of the art parser on this treebank.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Selectional Constraints and Subcategorization Frames in a Dependency Parser

Statistical parsers are trained on treebanks that are composed of a few thousand sentences. In order to prevent data sparseness and computational complexity, such parsers make strong independence hypotheses on the decisions that are made to build a syntactic tree. These independence hypotheses yield a decomposition of the syntactic structures into small pieces, which in turn prevent the parser ...

متن کامل

Estimating Probabilities for an Indonesian Stochastic Parser using the Inside Outside Algorithm

This paper presents work in constructing a Probabilistic Context Free Grammar (PCFG) parser for Indonesian. Due to the unavailability of a large manually parsed corpus, we start from an existing symbolic parser to parse a relatively small collection of Indonesian sentences. A PCFG language model is extracted, ignoring explicit linguistic information encoded in feature structures, and is subsequ...

متن کامل

Rules and Constraints in a French Finite-State Grammar

This report describes the rule system of a robust nite-state parser implemented for French. The parser attaches syntactic tags to each word as well as part-of-speech and morphological tags, and determines clause boundaries. It is a reductionist parser i.e. it removes readings from the originally ambiguous text. The underlying parser is based on nite-state networks and their intersection. We des...

متن کامل

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Robust Parsing Based on Discourse Information: Completing Partial Parses of Ill-Formed Sentences on the Basis of Discourse Information

In a consistent text, many words and phrases are repeatedly used in more than one sentence. When an identical phrase (a set of consecutive words) is repeated in different sentences, the constituent words of those sentences tend to be associated in identical modification patterns with identical parts of speech and identical modifiee-modifier relationships. Thus, when a syntactic parser cannot pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013